VAST Challenge 2021
Mini-Challenge 3
No
80 hours
Yes
Please limit your answer to 8 images and 500 words.
The messages of type “ccdata” come from transcripts and contain event reports. Some events are routine (low risk) and others are emergencies, ranging from medium to high risk for the safety of the Abila population.
The messages of type “mbdata” come from microblogs and contain informative reports (from government, media accounts, and eyewitnesses, for example) as well as chatter/junk/spam. There are re-posts for both types of messages. One way to distinguish between those two groups of messages is to identify user names that may be more trustworthy than others.
Government entities such as the fire department and police have accounts: AbilaFireDept and AbilaPoliceDepartment respectively that release messages such as “Abila Fire Department announces expansion of evacuation area #AFD”.
Some of the media accounts like AbilaPost and InternationalNews post news about events. For example: “POK rally expected to draw in excess of 1000 people”. There are also groups of messages from eyewitnesses at each major event represented in the data. The account anaregent was at the rally, judging by this message: “Good turnout tonight.Ëœ there must be 2000 people here!”
The accounts truccotrucco, Simon_Hamaeth and roger_roger seem to be hostages at the Gelato Galore. One message from truccotrucco reads “why didnt i stay at the rally? im trapped in here the van is right outsdie the door”. The account megaMan seems to be outside the standoff with messages such as “sorry! its hard to report everything when yr behind a mailbox” and “Been livecasting this for about 1/2 hour - not sure how it will end”.
In contrast to the first group junkman377 and junkman995 are examples of accounts that post only advertisements, such as “How's your credit rating? Need a boost?”.
Other accounts post random thoughts, such as the account Clevvah4Evah posting about grammar and the account KronosQuoth posting cliches like “If you're going through hell keep going”. Interestingly, KronosQuoth is the most frequent poster of messages by a large margin.
Some accounts only re-post, which means they add no new information. In the image below, the software automatically grouped together AbilaFireDept messages with re-posts by other authors.
In the data there were a total of 1000 re-posts made by just 40 users.
The term scores from the top 25 authors allowed the creation of a network that shows pockets of similarity. For instance, the Cevvah4Evah, POK, FriendsOfKronos, and Victor-E group tends to share similar content and often reference one another.
Please limit your answer to 10 images and 1000 words.
To model the answer to this question, we identified keywords for the three categories of risk (low, medium, and high) and iteratively refined the model through visualizations. We also uncovered noteworthy events using our automated categorization process (a rally, a fire, an explosion, reckless driving, running a red light, bicycle/pedestrian hit & run, and a hostage situation).
We used the model for risk level to pinpoint when the risk is rising or falling. The peaks in the visualization represent the fire (first peak), car being hit (second peak), hostage situation (all remaining peaks in the latter half of the timeline), and an explosion (the very last peak). Heightened activity is shown later in the night.
Using an unsupervised approach it is possible, in real time, to monitor messages at a given interval for detection of issues that may not have been called into the police department. In our method we use 5-minute intervals to derive descriptive terms from one interval to the next. The previous time step is compared against the next time-step to determine if a significant difference in word usage has changed from one 5-minute interval to the next.
This monitoring can be successful if enough people message and re-post activity as it occurs. For example, within 5 minutes an automated system could flag events like the following:
Please limit your answer to 8 images and 500 words.
By creating a dashboard of the message frequency accounting for the modeled risk level we provide first responders a way to monitor potential needs and to prioritize their activities. Selecting a region of high activity in a particular risk category displays the location of the event and a list of the messages to help with understanding the nature of the event. Arranging them in order of time by default helps to establish the narrative.
Viewing all messages at a particular time provides a broad view of the noteworthy message content which can avoid missing activity simply because it wasn't flagged as high risk. For example, the building fire at the Dancing Dolphin is shown to start around 6:4PM amidst traffic stops and other activity.
Viewing regions where a particular risk category has an increase in activity is the fastest way to narrow in on a potential area to send aid and works for any risk category.
By moving examining the peaks in high risk messages we can see that the evacuation of the Dancing Dolphin begins around 7:20PM.
While the emergency fire situation is ongoing a hostage situation can be seen to cause another spike in high risk messages around 7:40PM.
Developments and updates being shared about these ongoing situations also generates more (though smaller) spikes in messages identified as high risk.
The emergency situation in Abila ends with an explosion at the Dancing Dolphin at around 9:30PM.
We did no solve this mini-challenge in 2014.